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- ■ Abstract: We study the background cosmology of the ghost-free, bimetric theory of 

Q_j gravity. We perform an extensive statistical analysis of the model using both frequentist and 

Bayesian frameworks and employ the constraints on the expansion history of the Universe 
from the observations of supernovae, the cosmic microwave background and the large scale 
structure to estimate the model's parameters and test the goodness of the fits. We explore 
the parameter space of the model with nested sampling to find the best-fit chi-square, obtain 
the Bayesian evidence, and compute the marginalized posteriors and mean likelihoods. We 
mainly focus on a class of sub-models with no explicit cosmological constant term to assess 
the ability of the theory to dynamically cause a late-time accelerated expansion. The 
model behaves as standard gravity without a cosmological constant at early times, with 
an emergent extra contribution to the energy density that converges to a cosmological 
constant in the far future. The model can in most cases yield very good fits and is in 
perfect agreement with the data. This is because many points in the parameter space of 
the model exist that give rise to time-evolution equations that are effectively very similar 
to those of the ACDM. This similarity makes the model compatible with observations as 
in the ACDM case, at least at the background level. Even though our results indicate a 
slightly better fit for the ACDM concordance model in terms of the p-value and evidence, 
none of the models is statistically preferred to the other. However, the parameters of 
the bigravity model are in general degenerate. A similar but perturbative analysis of the 
model as well as more data will be required to break the degeneracies and constrain the 
parameters, in case the model will still be viable compared to the ACDM. 
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1. Introduction 



Ever since the first proposal by Fierz and Pauli |l|, giving gravity a mass has seemed 
like a theoretically appealing extension of gravity. However, the fact that these theories 
contained ghost instabilities leading to the wrong limits in systems such as the solar system 
Q rendered them unattractive, and research endeavors into massive gravity theories lay 
for the most part abandoned for decades.^ 

Recently this has changed as new nonlinear interactions free of ghosts have been dis- 
covered, described and explored by de Rham, Gabadadze, Tolley and others ||^, ^, ^, |7[ |8|, 
|, 0, |ll|, [l|, H, g, m, H, 0, H, m, m, H, m, H, m, H. This formulation of 
massive gravity used a fixed extra two-tensor with no dynamics of its own ||Tl| and does 



not have stable homogeneous and isotropic solutions |17, 22 



A generalization to having two free and dynamic metrics in the formulation was then 



proposed and showed a theoretically viable option by Hassan and Rosen in |27, p^, 29 1 



^For a recent review of massive gravity see 
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Various theoretical aspects of this interesting theory have been explored for instance in 
|3C, 32, 33, 34, 3^ 37, 35, 40 1. In particular the cosmology has been explored in 



1 41] which emphasized the energy contribution of the extra field, in [42| that described 
the Priedmann-Lemaitre-Robertson- Walker (FRW) solutions and sorted them into two 
branches and in |43] that looked at data constraints in some specialized corners of the 
parameter space. In [^] the cosmology of the theory in the setting where the fundamental 
metric is homogeneous and isotropic while the other is inhomogeneous was explored. In 
1 45] the cosmological perturbation equations for this theory were given for the first time 
and these were reformulated and explored further in 



and ]47]. Using an alternative 
approach, it was recently claimed that the cosmological solutions exhibit an instability for 
some parameter combinations ]48]. Such instabilities were however not observed^ in the 



studies J 46, 47] that used standard cosmological perturbation theory. 

Although the background cosmology of this theory with two homogeneous and isotropic 



metrics has been explored in j41, 42, 43], none of them has given an exhaustive parameter 
estimation and exploration of the full parameter space in comparison with data in order 
to find whether this theory has new dynamics yielding better or comparable fits to those 
of the standard model of cosmology (A Cold Dark Matter; ACDM; for an introduction, 
see e.g. ]49, 50]). This is what this paper endeavors to do. It is quite interesting to 
study precisely how viable these models finite mass of a graviton would provide a 

well-motivated and theoretically consistent explanation for the enigmatic speed-up of the 



late-time expansion of the Universe (for a review, see e.g. ]51]). In particular, were the 
data to prefer the higher order interactions of graviton that give it a mass over the constant 
term, one could also hope to shed new light on the long-standing cosmological constant 
problem [^. For a large collection of other possible dynamical models of dark energy 
and modified gravity that have been proposed in attempts to address the issue of cosmic 
acceleration, see e.g. the reviews [| 



54,m. 



The paper is organized as follows; In section |2| we give a brief theoretical introduction 
to bigravity theory and in particular its formulation in two homogeneous and isotropic 
metrics. In section ^ we describe the observational data that we used to compare with 
predictions of the model in different parts of parameter space. We also present a handy 
reformulation that we used for the numerical integration of the equation set and how we 
performed this integration. In addition we describe the statistical methods we used to 
perform the parameter estimation. In section ^ we describe the results of our analysis 
and parameter estimation when a method for numerical integration of the equation set 
was established. To better understand the dynamics of the solutions we first present the 
analysis of certain specific parts of the parameter space, and in this way we demonstrate 
how the system becomes degenerate, how good fits to the data can be obtained, but also 
how the search for a best-fit model reveals that a theory which mimics the ACDM as closely 
as possible is preferred, and the closeness^ to the ACDM that is obtainable in these theories 



^This is possibly because the "backward instabilities" reported in correspond to decaying modes. 
In any case, in our comprehensive background analysis, we found no instabilities, which strongly suggests 
that at least the homogeneous modes are stable in these cosmologies. 

^Which is in fact arbitrarily good in more than one corner of parameter space. 
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is just what makes them such good fits to the data. 



2. The model 

2.1 Bimetric theory of massive gravity 

In this section, we briefly review the ghost-free, bimetric theory of gravity without restrict- 
ing the framework to any particular choices of metrics. We start from the action level and 
present the modified field equations for the dynamical variables of the theory. 

The theory contains two space-time metrics. The first one, which we denote by is 
assumed to be the physical metric, namely the metric based on which all usual distances in 
cosmology are defined. The other metric, denoted by f^^, is a dynamical rank-two tensor 
that is essential for the theory to be ghost-free; this metric when coupled to the physical 
metric, gives mass to gravitons^ 

The action for our ghost-free, massive theory of gravity was first presented in p8| . 
We will follow the notation of everywhere in the paper, and most of this introductory 
section (and the next section) will follow that paper quite closely. Let us begin with the 
action which has the following form: 



M r M r 

S = -^J d'x^/^dit^i^(5) - ^ y d^xy^^d^Rif) 

4 

+m'Ml j d*x^-detg^l3nen (7^) + j d^xy" - det gCm (s,^)- (2.1) 



Here, the matrix a/ g~^f is defined such that \/g^^f\/g^^f = g^^f^iX, R{g) and R{f) 
are the Ricci scalars for the metrics g and /, respectively, Mg and Mj denote the Planck 
masses corresponding to the two metrics, and Cm {g, ^) shows the Lagrangian for the 
matter sector. In addition, e„(X) are elementary symmetric polynomials of the eigenvalues 
of the matrix X and have the following forms: 



eo(X) = l, ei(X) = [X] , (X) = ^ ([X]' - [X^]) , 

es (X) = \ (y.f - 3 [X] [X2] + 2 [X3] ) 64 (X) = det (X) , (2.2) 



where square brackets denote the traces of the matrices X. The small m in eq. (2.1) is the 
graviton mass and the five quantities /3„ (n = 0, ...,4) are free parameters that need to be 
determined observationally. 

The equations of motion for the two metrics (or the equivalents to the Einstein equa- 
tions) can be derived by varying the action with respect to the metrics; this gives: 



*For other recent developments in bimetric and biconnected spacetimes, see ^ |60| . 
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2 3 



n=0 



R 



n=0 



(2.3) 
0, 

(2.4) 



where over-bar denotes quantities corresponding to the / metric, T^u is the stress-energy- 
momentum tensor, = Mj/Mg, and 

y(o)(x) = i, (X) = X - 1 [X] , y^^) (x) = x^ - x[x] + ([x]^ - [x^]) , 

y(3) (X) = - X^ [X] + ^X (^[X]2 - [X2] ) - ^1 ([X]^ - 3 [X] [X2] + 2 [X^] ) . (2.5) 



As in standard gravity, these equations determine the dynamics of the space-time degrees 
of freedom, i.e. the geometry and time-evolution of the Universe, if the properties of matter 
and energy are known (through the tensor T^j/). 

As observed in [E^, we can perform the constant rescahng fi^^, — )• j^fuu and /3j — )• 

I /3j in order to set to unity. This quantity is therefore not a free parameter of 
the theory; we will drop it in the rest of the paper. We also assume Mg to be the usual 
(reduced) Plank mass Mp;. 

In addition to the equations of motion, further constraints are imposed on the dy- 
namics of the two metrics by Bianchi identities and the assumption that the stress-energy- 
momentum tensor of the matter components is conserved: 



n=0 



(2.6) 



V'^^(-l)"/34-n U/^ +/.Ay("„), (^/^ 



n=0 



0. 



(2.7) 



As we will see in the next section, this gives us an extra piece of information which 
dramatically simplifies the evolution equations when applied to the entire Universe. 

2.2 The case for an isotropic and homogeneous universe 

In the previous section, we described the full theory of ghost-free bimetric gravity where 
we did not assume any particular forms for the metrics. The purpose of the present paper 
is however to compare the cosmological predictions of the model to real data. A fundamen- 
tal assumption in the standard concordance model of cosmology, the ACDM, is that the 
Universe on large scales is spatially isotropic and homogeneous. This assumption restricts 
the metric of the Universe to be of the FRW form. Based on the same observational and 
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theoretical reasons, we follow [43| and assume both of the two metrics g and / in our 
bigravity model exhibit spatial isotropy and homogeneity. As we will see, this assumption 
leads to a set of generalized Friedmann equations that we will use in comparing the back- 
ground dynamics of the Universe, predicted by the model, to different types of cosmological 
measurements. 

The spatially isotropic and homogeneous metrics g and / read 



dsi 



ds) 



-dt^ + 



dr^ 



kr"^ 

dr^ 
1 — kr^ 



+ {de^ + sin^ 

+ r^de^ + sm^ed<p^) ] , 



(2.8) 



where a{t) and Y{t) are the time-dependent spatial scale factors corresponding to the 
metrics g and /, respectively. X{t) is a time-dependent function that must be determined 
through the evolution equations given by the model. We have additionally assumed the 
same spatial curvature for the two metrics {k = — 1,0, +1). 

In order to have consistent solutions in this case, where the matter sources are time- 
dependent, eqs. (^]^) and ( p.Tl) require the following relationship between the functions 
X{t), a{t) and Y{t) 



X = - = —, 
d da ' 



(2.9) 



where over-dot denotes derivative with respect to time. As we mentioned earlier, this 
relation is necessary for the conservation of the stress-energy-momentum tensor through 
the Bianchi identity of g. 

As in the ACDM case, we assume that the Universe is filled with dust, with the 
energy density pm, and radiation, with the energy density (other components can be 
added easily) . The equations of motion (^^) and (^^) then give the following generalized 
Friedmann equations ^: 



, a\ k 
3( - +3^ 
a J 



m 



/3o + 3/3i-+3/32^+/33^ 
a o'^ 



{pm + P7) 

g 



(2.10) 



+ m 



Y 



/3o + /3: 




+ 3- 



+ /33 



2 3 

/34 + 3^3f + 3/32^ + /3i^ 



y2y 

a^d 



0, 



1 



3M| 



Pi, 



(2.11) 
(2.12) 



m 



^ , a d\ ^/a^ ad \ „ a^d 
/34 + /33 ( 2- + ^ j + [y, + 2^ j + 



aa 



YY 



^Note that eq. (2.11) is slightly different from the equivalent equation in 
in that paper which has been corrected here. 



k 



0. (2.13) 



Y2 

this could be a misprint 
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3. Numerical investigation of the parameter space 



This section is about our strategy and methods we use to compare the predictions of the 
bigravity model to a set of cosmological observations, assess how well they match and then 
constrain the parameters of the model. We first, in section |3.1| , introduce the dataset we 
use and the cosmological quantities that have to be computed theoretically in order to 
make the comparison with observations. Section |3.2| describes all the simplified dynamical 
equations for the bigravity theory that we employ in our numerical investigation. It also 
shows more clearly what dynamical variables play the most important roles in the evolution 
equations. This in addition helps us to understand the physics of the model in comparison 



to the standard model. We then continue our description of the model in section 3.3 with 



a discussion about the initial conditions we need to set for the evolution equations in our 



numerical implementation of the model. Finally, in section |3.4| we focus on the statistical 
aspects of our work and review different statistical frameworks we employ, as well as our 
strategy for exploring the parameter space of the model. This clarifies how we aim to 
compare the model's predictions to the real data in practice. Readers who are familiar 
with or not interested in such statistical and scanning techniques can skip section \iA\ and 
continue with our results and discussions in section ^. 

3.1 Constraints from cosmology 

Cosmological data that are used in comparing predictions of cosmological models to obser- 
vations are classified into two main categories: 1) constraints from measuring the geometry 
and background evolution of the Universe on large scales (expansion history), and 2) con- 
straints from the formation, distribution and evolution of structures in the Universe (growth 
history). In order to see whether a model is viable observationally, one usually starts with 
the background cosmology. This is also simpler to study since the background equations 
are considerably simpler to derive and implement numerically. Studying the structures 
requires the field equations to be perturbed around the background (FRW metrics in our 
case). Here, we only work with the background dynamics and leave the investigation of 
the model at the perturbation level for future work. 

Three main sources of information in cosmology are 1) anisotropics of the Cosmic 
Microwave Background (CMB) Radiation, 2) Baryon Acoustic Oscillations (BAO), and 
3) Type la Supernovae (SNe la). At the background level, observational constraints on a 
theoretical model provided by these sources all involve calculations of different types of cos- 
mological distances. In general, such distances depend on the parameters of the model and 
by comparing them to the measured distances one can constrain the model and determine 
how successful the model is in describing the Universe. In what follows, we briefly review 
the distance measures important in CMB, BAO and SNe la observations, how they are cal- 
culated from a cosmological model and how they are related to the actual cosmological data. 

• Cosmic Microwave Background: In order to properly extract the information from 
the tiny fluctuations on the CMB one usually looks at the angular distribution of the 
fluctuations through the computation of the angular power spectrum. One can on the other 
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hand derive the theoretical power spectrum by solving some Boltzmann codes numerically 
and fit the model to the data by comparing the two spectra. The latter needs perturbative 
equations to be solved. There is however one important quantity that can be measured 
from the observed CMB power spectrum and only depends on the background equations: 
the position of the first peak on the spectrum (denoted by I a)- This represents the angular 
scale of the sound horizon at the recombination epoch. Since we are working only with the 
background equations in this paper, we adhere to this quantity to place constraints on our 
model. One can show that 

(1 + Zr)DAiZr) .„^^ 

'^=''^^) — ' ^'-'^ 

where DAizr) is the angular-diameter distance to the CMB last-scattering surface, i.e. at 
the redshift of recombination Zr- rs{zr) is the co- moving sound horizon at Zr- Theoretically, 
Da and as functions of redshift z can be calculated from the following expressions (we 
assume a flat universe, i.e. A; = 0): 

1 dz' 
(1 + z) Jo H{z') 
[°° dz' 

rs{z) = J^ csj^y (3.3) 

Here, c is the speed of light, c<j is the speed of sound before recombination and H{z) is 
the Hubble parameter at a given redshift z. The latest measurements of the CMB power 
spectrum by the satellite Wilkinson Microwave Anisotropy Probe (WMAP) |^] has de- 
termined the value of Ia to be 302.56 it 0.78. In addition, the value we assume for Zr is 



1091.12; we neglect the uncertainties about this value [43|. 



• Baryon Acoustic Oscillations: As in the CMB case, in order to extract full infor- 
mation from the distribution and evolution of large-scale structures in the Universe, one 
needs to look at the perturbative equations. This involves fitting the theoretical power 
spectrum of matter to the observed one. There are however simpler ways to work only at 
the background level. One such possibility is to consider the ratio of the sound horizon at 
the so-called drag epoch (with the redshift of Zd) to another quantity called dilation scale 
(denoted as Dy) at some arbitrary redshifts. Zd is the redshift of an epoch at which the 
baryon acoustic oscillations are frozen in (z^ ~ 1020). The theoretical value of the dilation 
scale Dy at any redshift z can be obtained from the expression 

Dv(z) = [££(l^)!5d(£)!,V^ ,3,4) 

In the present analysis, we use the values of this ratio when Dy is measured at redshifts 
0.106, 0.2, 0.35, 0.44, 0.6 and 0.73. These measurements have been done by the galaxy 
surveys 2dFGRS, 6dFGS, SDSS and WiggleZ (H, [||, ||) and are given in table |l[ 



In applying the CMB and BAO measurements mentioned above to our model, we fol- 



low the strategy used in [43| where the BAO measurements of rs{Z(i)/Dv{z) at different 
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z = 0.106 


2 = 0.2 


2 = 0.35 


2 = 0.44 


2 = 0.6 


2 = 0.73 




0.336 ±0.015 


0.1905 ±0.0061 


0.1097 ±0.0036 


0.0916 ±0.071 


0.0726 ± 0.0034 


0.0592 ± 0.0032 




30.96 ± 1.48 


17.55 ± 0.64 


10.11 ±0.37 


8.44 ±0.67 


6.69 ±0.33 


4.46 ± 0.31 



Table 1: Measured values of and (1 ± Zr)^j^^^ at different redshifts ([|62|, 63 



redshifts are multiplied by the CMB measurement of Ia in order to reduce the model de- 
pendence of the constraints. Assuming the measured value of 1.0451 it 0.0158 for the ratio 
rs{z^)/rs{zr) reported by WMAP collaboration, the combined constraints (from CMB and 
BAO) used in our analysis are the ones we have given in table m. 



• Type la Supernovae: Luminosity measurements of SNe la have proven to be a powerful 
source of information about the geometry and evolution of the Universe at late times. After 



the striking discovery of the accelerated expansion in 1998 |65, they have received much 



attention as standard candles in observational cosmology. The key quantity in using SNe 
la in cosmological model analysis is the luminosity distance Dl{z) which can be computed 
theoretically for a model and constrained directly from the SNe observations: 



Dl{z) = {l + z) c 



dz' 



(3.5) 



The dataset we use in our analysis is the Union2.1 compilation of SNe |67] which 
contains 580 SNe in total. We include the reported systematic uncertainties for the mea- 
surements given in that paper. 



• Present value of Hubble parameter {Hq): Hq is a quantity that appears in all 
expressions used in comparing predictions of a theoretical model to the real data. It is 
therefore important to know what value it has in reality. We use the value provided by 
seven-year WMAP observations, i.e. Hq = 71 ± 2.5 km s~^ Mpc~^. In ACDM cosmology, 
Hq is a free parameter of the model that is to be constrained observationally. As we will 
see in the next section, in our bigravity model, it is a prediction of the model when other 
parameters are fixed. 

3.2 Hubble parameter and evolution equation 

We saw in the previous section (eqs. (|3.2D, ( p.3[ ), ( |3.4| ) and (^)) that all quantities used in 
constraining a model through cosmological observations can be calculated theoretically only 
if the Hubble parameter H is known during the evolution of the Universe. For the standard 
ACDM model, this can be calculated in terms of the usual cosmological parameters such as 
various density parameters. In our bimetric theory, we have assumed that one metric (we 
chose it to be g) is the physical one and this is the metric that should be used in defining 
different cosmological quantities including distances. The metric is additionally assumed 
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to be of FRW form with a time-dependent scale factor (a) as the only dynamical quantity 
which determines the evolution of the Universe. This all means that we can follow the 
standard recipe and use the Hubble parameter defined based on that scale factor in our 
calculations of observable quantities (i.e. H = a /a). Our next task is therefore to find a 
time evolution equation for the Hubble parameter. 

As usual, we define the redshift z as z = a^/ a—1, where ao is the scale factor at present 
time (which is chosen to be unity) . This gives the possibility of calculating H as a function 
of z and using it directly in the expressions for cosmological distances. In addition, it turns 
out that working with the variable y = Y/a (where Y is the scale factor corresponding 
to the metric /) significantly simplifies the equations used in numerical calculations. In 
fact, as we will demonstrate below, all the interesting equations and dynamical variables 
(including the Hubble parameter) can be written in terms of y (which we expect to be 
a time-dependent quantity). For numerical integration, as will be discussed below, it is 
additionally useful to reformulate the equations in terms of the e- folding time x = In a. This 
transforms the time derivative ^ into the e-folding time derivative: ^ = = H-^, 

which we will denote by ^ = '. 

Before giving the expression for H in terms of y and parameters of our model, we 
perform one more redefinition, this time on the Pi parameters. It is obvious already from 
the action (eg. ( |2.lD ) that the parameter m? is redundant since it just multiplies all the /3s. 
We can therefore redefine (3s such that they absorb m?. However, since m is presumably 
quite small (something of the order of the present value of the Hubble parameter, Hq), 
we further normalize the /3 parameters to the observed value of Hq (i.e. 71 it 2.5 km s~^ 
Mpc~^). This gives us a set of new parameters that we call Bs: Bi = m?/3i/HQ. Note that 
the exact value of Hq is not important here and we use it only for simplicity reasons, i.e to 
work with a set of parameters that are likely to have values of not more than a few orders 
of magnitude larger or smaller than unity. As we will see later, the present value of the 
Hubble parameter is not a free parameter of the theory and will be predicted by the model 
in terms of the other parameters. 

After including all the aforementioned modifications into eq. ( p.lO| ), we get an equation 
for the Hubble parameter as follows ^: 



the energy density for the component i). 

It can be seen from this expression that in order to compute the Hubble parameter at 
any given time during the cosmic evolution, one needs to know the values of the parameters 

^Here, and in the rest of this section, we include the curvature term in the equations for a general fc-value. 
As we will mention later, in the present paper we however analyze the bigravity model only for the A; = 
case (flat Universe); this value has been chosen for simplicity reasons and is also a justified assumption 
based on the current constraints on the curvature of the Universe from analyses of the ACDM concordance 
model. Including the curvature term is however a very straightforward procedure and we leave it for future 
work. 




■7! 



(3.6) 



where we have defined the density parameters as usual, i.e. = pi/{2>H[ 



'iM^) {pi being 
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Bq, ...jB^, as well as the dynamical quantities ^7 and y. Matter and radiation follow 
the standard evolutions with redshift according to their traditional equations of state, i.e. 
Qm = ^m(l + -2)'^ and = + z)^ . We have not included an explicit cosmological con- 
stant density parameter JIa in the equation because the parameter /3o, and correspondingly 
Bq, act as a cosmological constant; including the quantity Q\ is therefore redundant. 

As the next step in solving eq. ( p. 61 ) for H{z), we need to find what the values of y are 
at different times (or redshifts). Using eqs. ( p.lO| ) - (2.13) we obtain the following equation 



for y' (derivative of the /-metric's normalized scale factor y with respect to x 

-2Bo + Bsy^ + 3nra + + , + 6B2 + SB^y 

exp(2x) 



y' = {i 



+3B, (1 - y^) }/{Bo + + + ^^^^ (3,^ - l) 

+6Biy + 3B2 (3y2 _ l) + 2S32/ {2y^ - 3) - 3B^y^^ - y. (3.7) 



This shows that the dynamical variable y has a one-dimensional phase space, meaning 
that knowing its value at any given time is sufficient for obtaining its values at all other 
times; as we will see later, we use this differential equation to determine the value of y 
(and accordingly H) at different redshifts. We will discuss in the next section how we can 
obtain the value of y at a particular redshift to set an initial value for eq. ( |3.7D . 

Before continuing with setting the initial conditions, let us take a closer look at the 
equations of motion (2.10) - ( p. 13 ) to see whether there is any alternative approach in 



calculating the Hubble parameter as a function of redshift. The answer "seems" to be 
"yes" : the equation set yields another equation for the Hubble parameter (if at least one 
of the parameters Bi, B2, B^ or i?4 is nonzero): 

772 = 772 — -y +B3y + B2 + ^y-\ (3.8) 



Eqs. ( |3.6| ) and ( |3.8| ) are enough to set the values of the system at all times. We can 
see this by solving eq. (|3.8D for and plugging it into eq. (|3.6D . This combination of eqs. 
( |3.6| ) and ( p.SD yields a quartic equation in y"^: 

El I ^ B^ 

3 ^ ^ V exp {2x) Hi 3 



-r + ( ^2 + -^]y''+{B,-B,)y' 

+ + + ^ ^ .B2)y-^ = 0. (3.9) 

* 3 iiQexp(23;) / 3 

With equation ( ^^ ) we could in principle calculate the values of y at any redshifts 
without integrating the system through the entire history of the Universe. However, since 
a "quartic" equation can in general have as many as four real solutions, it is not necessarily 
easy to choose the right root at each given point: "is this root the one that the root from 
the previous point would have evolved to?" To insure that this is the case, we instead use 
the differential equation ( |3.7D to solve for y. This also guarantees real solutions throughout 



^DifTerentiating this equation witii respect to x gives the difTerential equation (3.7) 
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Akrami, Koivisto & Sandstad (2012) 




Figure 1: An example of how y evolves with redshift z for a model that is well fitted to the real 
data. This is obtained for a bigravity model with Bq — B2 — — B4 = 0, Bi = 1.44 and 



fiJli = 0.307 (see section 4.2.1 ). We see that y starts out at a zero- value fixed-point in the past 
and transitions to a larger-value fixed-point at late times. The dashed vertical line illustrates the 
present time (z = 0). 



the cosmic history since starting with a real solution, a differential equation with only real 
coefficients is guaranteed to yield solutions that will remain real at all times. 

By studying eq. ( |3.9D , we can immediately deduce what form the asymptotic solutions 
will take. At early times the terms containing $7^ and il.^ will dominate since they evolve 
according to = ^m(l + z)^ and = 0,^(1 + z)^, respectively. Deep in the past the 
equation will then approximately read as 

{n^ + n^)y = o, (3.10) 

so that y = 0, eliminating the contributions from y in eq. (^.6|). Hence, at early times, we 
will have the usual ACDM model with Q\ = Bq/3. As time progresses the other terms 
become important, and y starts varying with time. At late times, and the curva- 

ture terms become negligible, and y will be given in terms of a time-independent quartic 
equation. This means that the Universe will have transitioned again into a cosmological 
constant phase, but now with a different value of the constant. An example of this behavior 
can be seen in fig. |^ where we show y as a function of redshift {z = —1 corresponds to 
far in the future). We therefore expect the model to give good fits to the cosmological 
observations as long as the change in y is not dramatically large and the Hubble parameter 
evolves in a way similar to that of the ACDM model. 

3.3 Setting initial conditions 

To perform the numerical integration of eq. ( |3.7D , we start from the present time and 
integrate into the past. This is done by solving the quartic equation ( p. 91 ) at z = and 
then using eq. ( |3.7D to obtain y at all interesting redshifts throughout the evolution of the 
Universe. One could in principle solve eq. ( p. 9]) at any other arbitrary redshift instead, 
but choosing z = is particularly useful in practice for simplicity reasons, as well as to 
set the initial conditions stably. In addition, it makes it possible to compute co-moving 
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distances belonging to each redshift of interest in the same integration code as the one 
used for computing the Hubble parameter. This makes the statistical analysis of the model 
faster and the numerical calculations more rational. Now that the starting values of y and 
H (i.e. at z = 0) are determined for a set of model parameters using eqs. ( |3.9| ) and (p.^), 
the numerical integration can commence using eq. ( |3.7| ) and the Hubble parameter can be 
calculated at all redshifts using eq. (^). 

Some caveats however need to be considered for this scheme since the quartic equation 
generically has several solutions: Firstly, all values of y should not be accepted, y is the 
scale factor of a metric (/) normalized to the scale factor of another metric (^f), and it 
therefore seems meaningless for y to take on complex or negative values. Therefore only 
real and positive roots of eq. ( |3.9| ) should be considered; this means that many choices of 
parameters yield no viable solutions. Secondly, we see from eq. Q and <^ that H"^ and 
not H is the quantity that is directly calculated from y\ we should therefore ensure that it 
receives positive values. We will see in more detail in the next section how we implement 
these conditions numerically. And thirdly, for a single set of parameters of the model, 
there may be several viable roots for eq. ( |3.9| ). Again, as we will see later, this subtlety is 
taken care of by treating the initial value of y as an additional parameter in our statistical 
analysis. This parameter can in general take on 0, 1, 2, 3 or 4 different values at each 
point in the model's parameter space (depending on the number of allowed solutions for 
the quartic eq. ( ^.91) ). Hence, in exploring the parameter space for the best-fit parameter 
values, we should consider all the allowed initial values of y, perform the analysis and then 
choose the initial y that gives the best fit. 

3.4 Statistical framework, scanning strategy and comparison to observation 

So far, we have set up our theoretical model: we know how to solve the dynamical equations 
of the model, how to compute necessary observable quantities to be used in comparison 
of the theoretical predictions to various cosmological observables, and which data to use. 
What we intend to discuss in this section is how to confront the model with observation in 
practice, i.e. how to explore the parameter space of the model to test how fit it is to the 
data and what values of the parameters are preferred. Let us recap what we have found 
and discussed so far: 

• The full model has the following six free parameters: 

e = (/3o,/3i,/32,/33,/34,^^^). (3.11) 

Here, as we stated earlier, we have assumed a flat Universe (i.e. k = Q). In ad- 
dition, we have neglected the radiation contribution to the evolution equation and 
background observables, i.e. we have assumed ~ 0. This latter assumption can 
be justified by the fact that in the standard ACDM cosmology, the present energy 
density of radiation is vanishingly small (fJ^ = O(10~^)) compared to the matter 
contribution (Q^ = 0(1)). Its value then remains negligible up to the redshift of 
matter-radiation equality which goes well beyond the redshift range we are using in 
the present analysis. We expect a similar ratio between fiS? and in our bigravity 
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model which enables us to neglect radiation in the entire period of interest (from 
the present time to the epoch of recombination) as radiation and matter scale with 
redshift as (1 + z)^ and (1 + z)^, respectively. 

By fixing the values of all parameters 0, eqs. ( p.9D and ( |3.7| ) tell us how y, corre- 
sponding to that chosen point in the parameter space, evolves with time (or with 
redshift) . 

Eq. ( p.sp can then be used to calculate the Hubble parameter H as a function of z. 
Knowing H{z) at a given point in the parameter space is then enough for calculating 
all the quantities we need in order to compare the model (at that point) with the 



observations we described in section 3.1 



In order to constrain the model and test how fit it is to the real data, we need to 
perform a statistical analysis. This requires a scanning of the parameter space of the 
model and this can be done in various ways depending on which statistical framework one 
chooses to work with. In statistical parameter estimation and model selection, there are 
in general two different types of statistical inference that are fundamentally very different: 
"Bayesian" and "frequentist" statistics (for a general introduction to statistical inference, 
see e.g. ||6^, and for some discussions about the differences between the two approaches 



in data analysis, see |£9|, 7C, ^] and references therein). In presence of sufficient obser- 
vational data where the parameters of the model are strongly constrained, both statistics 
are proven to yield the same results. Any discrepancy between the inferences therefore 
shows that either we need more constraining data, or the numerical methods employed in 
exploring the parameter space are not accurate enough. In what follows we first briefly 
review the two approaches and their main ingredients used for both parameter estimation 
and model selection. We then discuss why it is essential to use both approaches in order 
to ensure that the results of the statistical inference are reliable and robust. We use both 
approaches in our analysis of the bigravity model in this paper. 

• Bayesian inference: This approach has now become among the standard tools in 
cosmological data analysis (for applications of Bayesian statistics in physics, see e.g. ||72|| , 
and for reviews of its applications in cosmology, see e.g. [^, 74, ^ 0)- Here, one can 



assign probabilities to the parameters of the model under consideration. Let us first define 
the following quantities: 1) P(0; H) = it{Q), which denotes our "prior" probability density 
function (PDF) and reflects our knowledge or prejudices about the values of the parameters 
before comparing the model with observations. 2) P{D\@; H) = C{Q), which is called the 
"likelihood" , and is the probability of obtaining the data D from the model parameters Q 
when it is considered as a function of 0. 3) P{@\D; H), or the "posterior" PDF, which 
represents the probability of the model parameters when the data D are used. Finally, 4) 
P{D;H) = Z, which is called the Bayesian "evidence" and is the probability of observing 
the data D when we integrate over the entire parameter space. Here, we have assumed H 
to be the model hypothesis under consideration which we have parameterize by 0. Bayes' 
Theorem then relates these quantities in the following way: 
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PfeiD:H)^£imjnEi^. (3.12) 

This expression simply shows how our prior degree of behef about different values of 
model parameters is updated when new observational data are used. Our updated knowl- 
edge, given in terms of the multi-dimensional PDF, P{Q\D; H), is the quantity of interest 
in Bayesian statistics through which various statistical statements can be made. One can 
for example construct different ("credible") intervals and regions in the parameter space 
corresponding to certain levels of confidence. In this framework, one-, two-, ... and (A^ — 1)- 
dimensional (where N is the total number of free parameters of the model) credible regions 
for any sub-space of the parameter space can then be easily constructed by integrating 
(or 'marginalizing') the full posterior PDF, P{Q\D; H), over all other parameters; this 
procedure then results in joint "marginal" PDFs for the parameters of interest. 

In Bayesian parameter estimation, one reports the "posterior means" of the parameters 
(the expectation values of the parameters according to the marginalized posterior) as their 
most-favored values. Uncertainties about the favored values (at a given confidence level) 
are then estimated by "probability mass" surfaces containing corresponding percentages of 
the total marginalized posterior PDF. 

In Bayesian model selection on the other hand, the quantity of interest is the evidence 

Z = P{D;H)= [ £(G)7r(G)(ie, (3.13) 
Jv 

where V is the entire A'^-dimensional parameter space of the model. Assume that we want 
to select the best-fit model between two hypotheses Hq and Hi. This can be done by 
looking at the ratio 

P{Hi\D) P{D\Hi)P{Hi) ZiP{Hi) 



P{Ho\D) P{D\Ho)P{Ho) ZoPiHoY ^^'^^^ 

where P{Hi\D) and P{Ho\D) are the posterior PDFs for Hi and Hq to be the true hy- 
potheses given the observed data, respectively, and P{Hi) and P{Hq) are the prior PDFs 
for the hypotheses before observing the data. The ratio P{Hi)/P{Hq) can be set to unity 
in case we have no preference for any of the two models a priori. In that case, one observes 
from eq. ( p. 14 ) that in order to see whether a model is preferred compared to another one 



one needs to evaluate Bayesian evidences for the two cases and look at their ratio. 

This is an interesting recipe for selecting between different theoretical models, but with 
two caveats: 1) The evidence ratio is useful only if the models at hand are equally moti- 
vated theoretically or based on previous observational constraints; this is not always true 
though. 2) One usually calculates the evidences numerically, meaning that the exact value 
of the integral ( 3.13| ) cannot be evaluated. In practice, one needs to choose some ranges 



for the parameters in the parameter space to scan over and this is effectively equivalent 
to imposing a prior. In most cases (at least when the likelihood has a Gaussian or nearly 
Gaussian shape), choosing larger ranges will give a smaller evidence (this can be seen also 
from the observation that the evidence is nothing but the average of the likelihood over 
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the parameter space, at least when flat priors are used). In cases where the alternative 
hypothesis Hi differs from the null hypothesis Hq in that it is only an extension of the 
Hq parameter space by adding new parameters (i.e. Hi contains Hq as a sub-space), the 
evidence comparison can be a very powerful method. The reason is that increasing the 
dimension of the parameter space yields a smaller evidence (see integral ( 3.13| )) unless the 



likelihood values improve in such a way that the increase in the volume of the parameter 
space is compensated (the evidence recipe therefore automatically implements Occam's 
razor). For completely different models however, one needs to know prior probabilities for 
the models, as well as the parameter ranges within each model. 

• Frequentist inference: In this approach to statistical data analysis, probabilities are 
assigned only to the data and cannot be assigned to model parameters. This means that 
the posterior PDF, P(Q\D), on which Bayesian parameter estimation is based, is com- 
pletely meaningless to frequentists. In this framework, one instead works only with the 
likelihood which by definition is the probability of the observed data for a fixed set of model 
parameters and is well-defined in the frequentist framework. 

Knowing the full, A'^-dimensional likelihood of the model, one can infer various statisti- 
cal properties of the model by constructing "confidence" intervals and regions (as opposed 
to the credible intervals and regions in Bayesian statistics). Perhaps the most common 
method for such constructions, that is numerically feasible enough, is the "profile likeli- 



hood" procedure |77, and references therein]. Here, instead of marginalizing over unwanted 
parameters, one maximizes (or profiles) the full likelihood along those parameters and ob- 
tains one-, two-, ... and {N — l)-dimensional profile likelihoods. The most-favored values 
for the parameters are then given by the values that maximize the full likelihood, and un- 
certainties upon those values (at a given confidence level) are constructed by iso-likelihood 
contours in the model parameter space around the best-fit point. For Gaussian-like like- 
lihoods (which are always the case if sufficient data are available), the profile likelihood 
method is a good approximation to the exact frequentist construction of confidence inter- 
vals and regions proposed by Neyman | [78| . One method that provides exact confidence 
regions and intervals even for complex (non-Gaussian) parameter spaces (although harder 
to implement numerically), is the "confidence belt" construction scheme [^9| . 

In frequentist inference, in order to assess how fit a theoretical model is to the ob- 
served data, one again works with the full likelihood. Let us assume that the number of 
data points (measurements) used in the analysis is N{D) and the model has N{Q) free 
parameters. In addition, we assume that each measured data point follows a Gaussian 
distribution (an assumption that is approximately true for the cosmological data we use in 
the present paper). One then expects the (= — 21n>C) at the best-fit (highest-likelihood) 
point to follow a chi-squared distribution with N{D) — N{Q) degrees of freedom if one re- 
peats the measurements ideally an infinite number of times. Calculating the p- value (the 
probability of obtaining a as large as, or larger than the one actually observed, assuming 
that the model is true) corresponding to the observed then provides a powerful tool to 
assess the goodness of fit of the model to the data. 
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• Scanning of the parameter space: Our discussions so far about the statistical frame- 
works have made it clear that no matter which strategy we use, one quantity that we 
need to estimate accurately is the likelihood of the model as a function of the free pa- 
rameters. Mapping of the likelihood correctly is therefore the first essential step in both 
Bayesian and frequentist statistics. Especially in the latter approach, it is all we need to 
know: the globally maximum likelihood value provides us with a goodness-of-fit test of the 
model (through calculating the p- value at the best-fit point), as well as the favored values 
of parameters and uncertainties around them (through the profile likelihood construction 
and iso- likelihood contours). For the Bayesian framework however, one needs to take one 
step further and calculate the evidence (for model comparison) and the posterior PDF (for 
parameter estimation). 

Simple grid scans are obviously the most straightforward methods in mapping the pos- 
teriors, but they are notoriously slow when implemented numerically for high-dimensional 
parameter spaces. An alternative approach that has become highly popular in cosmo- 
logical data analysis is based on the Markov Chain Monte Carlo (MCMC) methods (for 
an introduction and overview of MCMC methods, see |8C], and for their applications in 
cosmology, see e.g. |81|) which revolve around the idea that the density of the points ob- 
tained in the numerical exploration of the parameter space be proportional to the actual 
probability function that one aims to map. This provides a simple way of marginalizing 
over any set of parameters (required by Bayesian statistics) just by counting the number of 
sample points corresponding to a point in the selected sub-set of the full parameter space. 
MCMCs provide a largely improved scanning efficiency in comparison to grid searches. 

More recently, another scanning algorithm based on the framework of nested sampling 
fS^ , |83|| , called MultiNest has attracted considerable attention in both particle 

physics and cosmology. The primary purpose of this method is to calculate the Bayesian 
evidence for a model, but it also provides the posterior distribution as a byproduct. It has 
been claimed that the results of the MultiNest and the MCMC parameter estimations are 
identical (up to numerical noise), while the former is two orders of magnitude faster than 
the latter. The technique is clearly optimized for Bayesian inference, but in the absence of a 
competitive alternative, it can be used also for frequentist analyses if the model parameter 
space is not far too complex with many spike-like, fine-tuned regions. We do not expect our 
cosmological model to be of this sort and therefore choose MultiNest as the statistical tool 
in our present analysis (for an alternative algorithm optimized for frequentist statistics, see 
e.g. [H). 

As we mentioned earlier, the results of Bayesian and frequentist statistics do agree if 
the information provided by data is so strong that the likelihood dominates over prior con- 
tributions to the posterior distribution. In this case, the credible and confidence regions will 
coincide providing unique statistical conclusions about the model parameters. In Bayesian 
statistics, the set of prior assumptions play an important role in the final inference. In 
cases where such effects are dominant, the statistical conclusions cannot be trusted. This 
feature of Bayesian statistics is however very interesting because it provides a good measure 
of the robustness of a fit; one should not consider the fit definitive if it strongly depends 
upon priors. In this case the data are not constraining enough and/or the model under 
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consideration is so complex that a more detailed analysis of its parameter space is required 
(for a detailed discussion of the impacts of priors, but in a different context, see [^). One 
other way to check whether the results of a fit are independent of priors is to compare 
them with the results of a frequentist analysis. In cases where the frequentist measures 
(such as the profile likelihood) do not point to similar preferred values for parameters and 
uncertainties around them, we can conclude that the model is not constrained properly 
with the data, and priors have strong impacts on the results. 

The other interesting reason why one should consider both Bayesian and frequentist 
measures, such as marginal posteriors and profile likelihoods, respectively, in any scan, is 
that they probe different properties of the parameter space. Marginal posteriors are sensi- 
tive only to the total posterior probability mass in the sub-space that has been marginalized 
over, and they consequently do not probe fine-tuned regions with spike-like likelihood sur- 
faces. Frequentist measures, such as profile likelihoods, are on the other hand (by construc- 
tion) strongly sensitive to such regions. Considering both marginal posteriors and profile 
likelihoods therefore enables us to acquire a complete picture of the favored parameter 
regions and statistical properties of the model. 

As we discussed above, most of the state-of-the-art scanning algorithms that are widely 
used in cosmological data analysis are based on either MCMCs or nested sampling tech- 
niques (as we use in the analysis of the present paper). All such methods are designed and 
optimized for Bayesian inference as their main objectives are to calculate either the poste- 
rior PDFs or the Bayesian evidence (or both). Since there are no competitive algorithms 
(in terms of speed, efficiency and accuracy) that are optimized for the frequentist approach, 
it is very common to use those Bayesian algorithms also to map the likelihood of the model, 
the quantity that is needed for frequentist inference. This gives another reason why one 
has to deal with Bayesian inference (though implicitly) even if s/he is interested only in 
frequentist statistics. In principle, results of frequentist inference are independent of prior 
assumptions. This is however the case only if an accurate construction of the likelihood 
function is available, which in turn depends on how powerful the employed scanning algo- 
rithm is. Using algorithms that are optimized for Bayesian exploration of the parameter 
space then implies that any mapping of the likelihood function based on such scans can in 
principle be strongly impacted by prior effects (for more discussions, see e.g. [70|] ). 

Ideally, one wants to compute both marginal posteriors and profile likelihoods by 
sampling the parameter space. Our discussion in the previous paragraph illustrates that 
MCMC-based sampling techniques would give a good estimate of the former, but they 
may not be sufficiently accurate in providing the latter. One may be able to get a better 
estimate of the maximum values of the likelihood at each point in the parameter sub-space 
of interest by generating a much larger number of samples, but with typical numbers of 
Monte-Carlo samples generated by standard algorithms, the estimated values may be too 
far from the actual ones. The reason simply is that the algorithms may miss the spike- 
like maxima as they are not sensitive to the fine-tuned regions. One can however obtain 
sufficiently accurate values for the "mean" values of the likelihood along the unwanted 
parameters (as defined in |^l|), even from a small number of samples. Since in the present 
paper we will give our estimates of the parameter values only in terms of marginal pos- 
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teriors, and frequentist measures are used only to assess the degree of dependence of the 
Bayesian results on the priors (we do not aim to compute frequentist confidence regions 
and intervals), we plot the mean likelihoods, instead of the profile likelihoods, on top of 
the posteriors. We still expect a good agreement between marginal posteriors and mean 
likelihoods for well constrained parameters |81|. If the posterior PDF is Gaussian or is 
separable with respect to the sub-space of the parameter space for which the posterior 
has been marginalized, the mean likelihood will be proportional to the marginal posterior. 
Mean likelihoods are however not purely frequentist quantities, as the posterior PDF is 
used in their definition. They can additionally help us acquire more information about the 
shape of the distribution along the directions over which we marginalize the distribution. 
The marginalization procedure looses all such information in particular about the skewness 
with respect to the marginalized dimensions, which is important for understanding possible 
correlations between the parameters; this information can be to a great extent recovered 
by plotting the mean likelihoods in addition to the marginal posteriors. 

We end this section by a discussion about the construction of the model likelihood 
that is used in constraining the model. The scanning algorithm we use in our analysis (the 
MultiNest algorithm) works based on the calculation of the likelihood (or the x^) value at 
each point in the parameter space. This is done by summing over all values for different 
contributions from cosmological measurements of section 3J when Gaussian distributions 
are assumed for uncertainties around the measured values. In cases where a point in 
the parameter space, i.e. a set of parameters, gives unacceptable values for a computed 
quantity (such as negative values for y or as discussed in section O), we assume a very 
large value for the (o^ ^ very small value for the likelihood) ; this automatically disfavors 
such points. Also, in cases where the parameters give more than one solution for the initial 
value of y (at z = 0), we compare the values obtained for all the solutions and retain 
the one that gives the lowest value. 



4. Results and discussion 



In the previous sections, we introduced the bimetric theory of gravity and the specific 
model that we aim to constrain using cosmological data. In addition, we presented the 
data set, the parameter space of the model, statistical methods and measures, as well as 
the scanning algorithm that we employ in our analysis. In the following sections, we present 
our results for particular sub-spaces of the full parameter space, as well as the full theory. 
We also discuss our findings and some subtleties that must be taken into account when 
interpreting the results. At the end of this section, we summarize our statistical results for 
all different models in table |2|. 

4.1 ACDM as the reference model 

In order to assess how well the bigravity model fits the data, it is useful to compare the 
results to those of a reference model, with the obvious choice of the standard ACDM model. 
We know that the ACDM model is in perfect agreement with all existing cosmological con- 
straints, in particular at the background level. This makes the model phenomenologically 
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Figure 2: Onc-dimensional marginal posterior PDFs (black solid curves) and mean likelihoods 
(red dots) for the parameters (left panel) and 51a (right panel) in the (flat) ACDM concordance 
model fitted to the cosmological data. 68% (la) and 95% {2a) credible intervals for the parameters 
are shown by blue (inner) and green (outer) vertical dashed lines, respectively. 



very interesting and so far the best way of parameterizing the evolution of the Universe. We 
therefore expect any alternative models that agree with observations to effectively mimic 
the ACDM at late times; our bigravity model is no exception. 

In order to be consistent with the assumptions in the bigravity case, we define our 
ACDM model in terms of two free parameters $7^ (present value of the matter density 
parameter) and Hq (present value of the Hubble parameter). Assuming a flat universe, 
JIa will be a derived quantity and not a free parameter (Oa = 1 — ^m)- We scan this 
two-dimensional parameter space using the data and methods we described in the previous 
sections. The flat (top-hat) prior ranges we choose for the parameters are [0, 1] and [50, 100] 
km s~^ Mpc~^, for Q,^ and Hq, respectively. 

The value that we find at the best-fit point is 546.54, which corresponds to a p-value 
of 0.8709. This value shows, as expected, that the observed is less than la away from 
the predicted value, an indication that the model is very well consistent with the data. 
The value of logZ [Z being the Bayesian evidence) we have obtained for the model is 
—278.50. In addition, fig. ^ shows the one-dimensional marginal posterior PDFs (black 
solid curves) and mean likelihoods (red dots) for $7^ and r^A as the result of our analysis. 
The blue (inner) and green (outer) vertical dashed lines indicate 68% (la) and 95% (2(t) 
credible intervals, respectively. Both marginal posterior and mean-likelihood curves are 
normalized to their values at their peaks. Both curves have Gaussian forms and perfectly 
match, another indication that the model fits the data very well and its parameters are 
observationally well constrained. 

4.2 No explicit cosmological constant (Bq = 0) 

Let us now turn to our bigravity model. Looking again at the model's action, eq. (2.1), we 



see that the parameter /3o (and correspondingly Bq) is nothing but a cosmological constant 
term for the physical metric g (with I^a = ^o/3). In fact, by setting all other /3s to zero, it 
is obvious that our bigravity model will reduce to the standard ACDM model (it can be seen 



also from eq. (3^) for the Hubble parameter). We know from the observational constraints 



on the standard model that one gets a very good fit to the data if such a term is present. 
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This constant term has been proposed as the standard source of cosmic acceleration at 
late times. As we stated in section |l|, this assumption has been strongly challenged by 
both cosmology and particle physics considerations, mainly because of the difficulty of 
explaining its small but not zero value compared to the theoretical predictions without 
the need for a huge fine-tuning. Finding an alternative to the cosmological constant that 
explains the late-time acceleration of the Universe has therefore been of great interest. In 
order to see whether one can get an accelerated universe from our bimetric modification of 
gravity without a cosmological term (a self-acceleration mechanism), we set the parameter 
Bq to zero. We then try to fit the other parameters of the model to the data examining 
how well the model can explain the cosmic acceleration. 

In addition, before we analyze the full system, it is helpful to study its sub-systems 
where only one or a couple of the parameters Bi are nonzero and free to vary. This 
may help us to understand the dynamics of the model better, for instance, which terms 
are necessary for self-acceleration giving a good fit to the data, which parameters can be 
constrained observationally, and which ones remain unconstrained. In addition, such a 
step-by-step analysis will be helpful in understanding possible correlations between the 
parameters. 

4.2.1 Sub-models with tw^o free parameters: {B2,^m)^ (^3>^m) 

We start with sub-sets of the parameter space ( |3.1lD where all /3s (or Bs) are set to zero 
except one of them. We have already assumed Bq is fixed to zero (again, it is obvious that 
a model with only nonzero Bq and il^ will yield good fits to observations, as this is just the 
ACDM model that we studied in section 4.1). Keeping Q,^ a free parameter, we therefore 



have four choices for the reduced two-dimensional sub-space, i.e. cases with Bi, B2, -B3, 
or B4 being free. Clearly having only B4 non-zero will not yield an accelerated universe 
and hence a good fit. It is easy to see this by observing that with only B4 / 0, eq. ( p^ ) 
is nothing but the standard Hubble equation containing only matter and radiation. Such 
a model does not fit the set of cosmological observations. The interesting cases to study 
are therefore those of free Bi, B2 or B^ (when the matter density parameter Q,^ is also 
allowed to vary). 

• Bi and 0^ nonzero: The one-dimensional posterior PDFs and mean likelihoods for this 
case are depicted in fig. ^, where the left (right) panel corresponds to (Bi). As for the 
ACDM case, the black solid curves (red dots) show the posterior PDFs (mean likelihoods), 
and the blue (green) vertical dashed lines indicate 68% (95%) credible intervals. The two- 
dimensional posterior PDF in the il^-Bi plane is given in fig. ^ where the inner (outer) 
contours correspond to 68% (95%) credible regions. The Gaussian shapes of the curves in 
fig. 1^ as well as the observation that the posteriors and the mean likelihoods perfectly 
match indicate that the model is well constrained. Here, we have used fiat priors in our 
scans with prior ranges of [0,1] and [—5,5] for fl^ and Bi, respectively. The impact of 
increasing the scanning range for Bi on the plots is negligible, which is another indication 
that the analysis is statistically robust and the constraints upon the model parameters are 
reliable. 
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Figure 3: As in fig. |[ but for tlie bigravity model witli only Bi and r2^„ varying (other parameters 
are set to zero). The constraint on the value of graviton mass (m) in this model is = (1.45 ± 
0.25)H^, where Hq^71± 2.5 km s^^ Mpc'^ (right panel). 



The value at the best-fit point is 551.60, with the corresponding p- value of 0.8355. 
The ill this case is slightly bigger than that of the ACDM model, but the p- value shows 
that the observed x^, assuming that the (Si, r2^)-bigravity is the true model, is still less 
than Icr away from the predicted value. We conclude that the model is perfectly consistent 
with observations. 

With the mentioned chosen parameter ranges for the scans, the value of logZ we get is 
—281.73, which is smaller than that of the ACDM. An immediate conclusion might be that 
the ACDM is favored compared to our bigravity model, since the AZ (evidence difference 
between the two models) is more than 3. One should however be cautious in this, as 
first of all the Bayesian model selection procedure we described in section 3.4 is based on 
not only the evidence ratio (or log-evidence difference), but also on our priors about the 
models before observing the data (see eq. ( |3.14| )). This includes all previous observational 
constraints on the models as well as the theoretical preferences. We do not consider any 
previous observational prejudices about the models here, but theoretically one may argue 
that the bigravity model is preferred to the ACDM model because the latter (although 
very well consistent with observations) strongly suffers from purely theoretical problems 
(the measured value for Qa is not technically natural from a particle physics point of view). 
In addition, the prior ranges one chooses for the parameters to scan over can change the 
calculated value of the evidence. We can reduce the range for Bi in our scans and get 
better evidence values. For example, as we will see below, there are reasons why we expect 
Bi to possess positive values for the considered sub-set of the full model to agree with 
observations; this reduces the prior range by a factor of two. Other possible theoretical 
reasons may reduce the prior range even further, giving rise to an even larger evidence. 
For these reasons we do not use the Bayesian model selection approach in comparing the 
bigravity model with the ACDM in the present paper. We will however use it later when 
we investigate how our fits may improve by allowing more parameters of the model to vary. 
This would tell us whether increasing the dimensionality of the parameter space could help 
the model to explain the data better or not (Occam's razor). 

In order to better understand why the model gives a good fit to the data in this 
particular case, let us look at the dynamics of the scale factor ratio y at the best-fit point 
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Figure 4: Two-dimensional marginal posterior PDF for the bigravity model with only Bi and 
varying (other parameters are set to zero). The inner and outer contours represent 68% (Icr) and 
95% {2a) credible regions, respectively. 



of the model. Fig. [i| that we used as an example in section 3.2 actually corresponds to 



this particular sub-space of the parameter space. We observe from that plot that y starts 
off from the asymptotic value of zero in the far past (high redshifts) and evolves toward 
another constant value of y ~ 5.8 at the far future. We obtained this value from our 
statistical analysis of the model, but it can be understood also from analytical studies of 
the model, in particular by analyzing eq. ( |3.9| ) at z = —1 (far in the future). The redshift 
dependence of ^Im and tells us that the energy densities of both matter and radiation 
will vanish at z = — 1, and consequently for the {Bi, il^)-model, the quartic equation ( ^^ ) 
reduces to the following asymptotic form and analytical solution for y: 

Biy^-^ = =^ y = yi« 0.577. (4.1) 

This value perfectly matches the one we obtained numerically. Moreover, plugging this 
asymptotic value into eq. (^^) gives an equation for the Hubble parameter that resembles 
that of the ACDM cosmology with only a cosmological constant contribution to the energy 
density of the Universe (i.e. an asymptotically de Sitter universe). The energy density 
parameter in this case will become i?i/\/3. If y were constant over the entire evolution 
of the Universe, this would mean that the model could give a good fit to the observations 
with the value of w 1.2, where fl^'-"'^^^ is the measured value of within the 

ACDM cosmology (~ 0.7). We however know that y is not constant and for all redshifts 
larger than zero (which our measurements actually correspond to) has a value smaller than 
its asymptotic limit. Therefore, in order for the model to compensate for the smallness of 
y, we expect a good fit to the data with a somewhat larger value for Bi at the best-fit 
point. This is exactly what we observe as the result of our statistical analysis, i.e. a fit 
comparable to the ACDM with Bi 1.44. 

Obviously, a more rigorous analysis is needed in order to analytically justify the ob- 
tained value for Bi at the best-fit point (i.e. ~ 1-44), and to understand why this actually 
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gives good fits for the entire time of the cosmic evolution. However, we can get a better 
understanding of this by considering eq. ( |3.9| ) once again for our particular sub-set of the 
model where Bq, B2, -B3 and B4 are set to zero (and the contributions from radiation 
and curvature are neglected). The equation in this case becomes quadratic in y with the 
following solution: 



The model is expected to give a relatively better fit to the data if instead of z = — 1 
(far in the future), it matches the ACDM model at z = (the present time). Since the 
effective Q.\ in this case is Biy (from eq. ( |3.6D ), eq. ( [4. 2]) can be written in the following 
form at 2; = 0: 



20a = -0^ ± ^OoTTIb^ =^ B,=±^\[{l + ^^f-^^J]. (4.3) 

First of all, we must chose a positive value for Bi. The reason is the following: As we 
saw, the effective Vt\ has the value of Biy with a positive value favored by observations (to 
yield cosmic acceleration), y is the ratio of the two scale factors of the theory and has to 
be positive. We therefore observe that positive Bi is favored by data. In order to compute 
the value of Bi, we plug the values of JIa and il^ preferred by the data in the case of the 
ACDM model (i.e. J^a 0.7 and 0^ « 0.3) into eq. (gj). This gives Bi w 1.45 which is 
very close to the value we have obtained numerically. 

Furthermore, by increasing the redshift (moving into the past), Vtm increases and eq. 
( |4.2| ) shows that y always remains positive and real, and it smoothly transitions into y = 
in the extreme past. This means that the energy contributed by y to the dynamics of 
the Universe becomes smaller and smaller compared to the matter contribution at higher 
redshifts and the Universe becomes matter-dominated. The model therefore has a well- 
behaved solution and resembles the ACDM model at larger redshifts. This is another 
reason why the model yields good fits to the data. 

Finally, let us mention one more interesting implication of our results. We have seen 
that assuming only one of the Bs to vary (i.e. i?i), we get a robust constraint on its value 
(see the right panel of fig. |3|). We have already given the value at the best-fit point, i.e 
the highest-likelihood value [Bi ~ 1.44). Marginalizing the full posterior PDF over the 
unwanted parameters (f]^ in this case), gives the Bayesian preferred value for Bi and 
uncertainties upon it: Bi = 1.45 ± 0.25. In the absence of other Bs (or /3s), eq. ( ^J] ) 
indicates that the graviton mass is purely determined by Bi. Using the definition of Bi 
(i.e. m?f3i/HQ) and the obvious choice of /3i = 1 (there is in fact no need to define /3i as 
an independent parameter in this case; it can be absorbed into m?), we can conclude that 
m? = (1.45 lb 0.25)i/Q, where Hq = 71 it 2.5 km s~^ Mpc~^. Obviously, this constraint on 
the graviton mass is true only if our particular construction of the massive gravity theory 
is correct, no explicit cosmological term exists and all /3 parameters (except /3i) are zero. 
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These are strong assumptions that need to be tested observationally or justified theoreti- 
cally. 



• B2 and n'}^ nonzero: The results of our statistical analysis of the model in this case 
shows that the best-fit has the value of 894.0, which corresponds to a p-value smaller 
than 0.0001. This clearly means that the model is strongly excluded by observations (with 
more than 5a confidence) . This is confirmed by the very low value obtained for the Bayesian 
evidence: logZ = —450.25 (compare this with the ACDM or (i?i,0^) models). Let us see 
if we can analytically explain why one does not get a good fit in this case. 

As for the {Bi,Q,^) case, we use eq. ( |3.9| ) to study how y behaves in the (-62,^^™) 
case. The equation becomes a cubic one with the following form: 



y {B2 {y^ - 1) + = 0. (4.4) 
The trivial solution y = is clearly not interesting, and will not yield a good fit, as 



it does not produce any term in eq. (3.6) that mimics the cosmological constant (which 
is essential for giving a good fit). The model in this case is nothing but the CDM model 
(with Q\ = 0). The acceptable solution for y is therefore: 



We again chose positive y in order to have a physically meaningful model. We observe 
from eq. ( |3.6D that at late times we have an effective cosmological term Q,/^ = B2y^. To 
have any hope of obtaining good fits this term must be positive. This implies that B2 must 
be positive. 

We immediately see from eq. (4.5) that as Qm gets bigger than B2, y turns imaginary. 



which is unphysical. Since Clm scales as (1 + z)^ with redshift, we expect this to always 
happen for redshifts bigger than some z* , where 

z* = {§-)'^-l. (4.6) 

In case a value of B2 could be chosen such that z* becomes larger than the maximum 
redshift we have considered in our analysis (i.e. z ~ 1100) we would still be able to get 
a good fit, even though the model would need to be modified at redshifts higher than z* . 
We can check this by trying to find an estimate for B2 at the best-fit point. As in the Bi 
case, this can be done by assuming that B2y^ = Vl^^^^ today. Eq. (^3) then reads 

B2 = nl + ni^''^ = 1. (4.7) 

Therefore, as soon as Vtrn > B2 = 1 the model becomes unphysical. Assuming a value 
of ~ 0.3 for $7^, this corresponds to z* ~ 0.5, which is well within the range of the cos- 
mological data we have used. We therefore conclude that the {B2, f^^)-model cannot yield 
good fits. 
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• ^3 and Vtl^ nonzero: As for the previous case, our statistical analysis shows that the 
model is excluded observationally. The reason is that the best-fit iii this case is 1700.5 
(p-value < 0.0001), which is even larger than that of the (-B2, r2^)-model. The Bayesian 
log-evidence is —850.26 confirming that the model does not fit observations. Again, in 
order to understand the reason analytically, we proceed in the same way as in the previous 
cases, starting from eq. (|3.9D. The equation becomes: 

y (^y^ - Bsy + O = 0- (4.8) 



Again, it is the nontrivial cubic equation in brackets that is a possible solution. To see 
how this behaves we consider the discriminant of the equation: 

A = -Bl ( Bl - ^-nA . (4.9) 



3 V 4 

When A becomes negative, as it will when |r2m > B3, only one root of the cubic 
equation will be real. Hence, it is natural to only consider this solution, which is: 



V 2 3 



+ - ^Bi + \klm - - ^Bl 



9 



(4.10) 



The effective cosmological constant in this case is (see eq. (|3.6|) ) Q.\ = B^y^/S. A 
good fit to the data in this case means a positive cosmological constant, which given that 
y must be positive, implies a positive value for B^. Eq. ( 4.10| ) is however not consistent 



with both B3 and y being positive. This means that the model cannot fit the data if A 
becomes negative within the range of the data. The redshift z* beyond which this happens 
can be estimated in a similar manner as for the B2 case: 

At the present time, eq. ( [4.8[ ) reads 

Bsy = n'^ + ni^^^ = l =^ y = ^. (4.12) 

B3 

Plugging this value for y in terms of B3 into the relation fi^*^^^^ = B^y^/S, we get 

^3 ~ (3f^i^^*^)-^ (4.13) 

as an approximation for B3 at the best-fit point. Assuming a value of ~ 0.7 for fl^'-"^^ , 
eqs. (P^ ) and ( pl|) give z* ~ 0.15, which is clearly within the range of the data. This 



implies that the model cannot give good fits to the data and is therefore excluded. 
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Figure 5: As in figs. ^ and||, but for the bigravity model with only Bi, B2 and fij^ varying (other 
parameters are set to zero). Here, (flat) prior ranges for both Bi and B2 are [—5, 5], and for is 
[0,1]. 



4.2.2 Three-parameter sub-models: Vtm and two Bs 

Now that we have some hold on what the different mass terms can do, it is interesting to 
study how two terms may interact to give even better fits. This is especially important since 
the interactions are non-trivial. As we will see, by adding more and more free parameters 
the model shows some degeneracy patterns that lead to a behavior which makes parameter 
fits elusive, while at the same time reveals important facts about the physics of the model. 

We start with the {Bi, B2, ri5^)-model where all Bs, except Bi and B2, are fixed to zero 
and flat priors are imposed on all free parameters. Scanning over the parameter space with 
prior ranges [—5, 5] for both Bi and B2, and [0, 1] for results in the best-fit of 546.52 
corresponding to the p-value of 0.8646; the log-evidence is —279.77. The improvement 
in both p- value and evidence, compared to the r2^)-model, indicates that opening 
up a new dimension in the parameter space helps the theory fit the data better. The 
improvement in the evidence is of particular interest since adding more free parameters to 
a model can in general lower the evidence, as the prior volume increases, unless this effect 
is compensated by significant increase in the number of high-likelihood points. 

However, the one-dimensional posterior PDFs and mean likelihoods for the model 
parameters in this case tell us that the model is not well constrained by the data (see fig. 
|5|). seems to be well constrained. Even though Bi also seems to be constrained, the 
mismatch between the posterior PDF and mean-likelihood curves, as well as their non- 
Gaussian shapes are a sign that the constraint may not be trusted. For B2, both curves 
show that negative values are preferred, although they do not match very well. One may 
think that this is because the prior ranges, at least for B2, have not been chosen large enough 
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Figure 6: As in fig. ^, but for larger prior ranges for Bi and B2 ([—1000, 1000]). The prior range 
for ri°j is kept the same ([0, 1]). 



and by increasing them Gaussian curves around preferred values of the parameters would 
show up. Fig. ^ however demonstrates that this is not the case. Here, we have enlarged 
the prior ranges for both Bi and B2 to [—1000, 1000] and we still see a similar pattern for 
the B2 posterior PDF, now extended to even more negative values. The posterior PDF and 
mean-likelihood curves for both Bs now show completely different patterns, an indication 
that the full posterior PDF of the model is not a Gaussian distribution; in addition it 
indicates that the posterior is not separable with respect to the Bi and B2 sub-spaces [^ . 
The differences in the Bi curves indicate non-Gaussianity, which can be due to the fact that 
at least one of the marginalized parameters {B2 in this case) is skewing the distribution in 
its direction. If we fixed B2 at its maximum likelihood value, the marginalized posterior 
distribution would change in the direction of its mean-likelihood curve, a feature that we 
observed in the i7^)-model. The different posterior and mean-likelihood curves for B2 
similarly show that the distribution is skewed in the direction of another parameter of the 
model (Bi). In other words, the model parameters are correlated. 

In order to see the correlation between Bi and B2 more clearly, we show in fig. ^ the 
distribution of the obtained sample points in the B1-B2 plane, for both scans with [—5, 5] 
and [—1000, 1000] prior ranges; the strong correlation can be observed from these plots. 
This simply means that in order for the model to yield good fits to the data, the point 
{Bi,B2) must be located on the narrow correlation region given by fig. |. This is a clear 
example of the fact that one-dimensional marginalized curves do not always reflect all the 
interesting properties of model parameters, in particular when non-trivial correlations exist 
between the parameters. 

The parameter is much more robust under the change of the priors, with Gaussian- 
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Figure 7: Two-dimensional unweighted distributions of the sample points in the B1-B2 plane for 
the bigravity model with only Bi, B2 and fjjjj varying (other parameters are set to zero) and with 
different choices of prior ranges. These illustrate how the two parameters are strongly correlated. 



like shapes for both marginalized posterior and mean-likelihood curves; the curves also 
look very much the same. A small correlation between and the Bs however makes the 
distributions slightly move if we change the prior ranges, as we observe from figs. || and |6|. 

For the S3, r25^)-model, similar conclusions can be obtained. The best-fit for 
the Bi and B3 prior ranges of [—5, 5] (flat priors) is 542.82, with the p-value of 0.8878; the 
log-evidence is —280.10. Note that in terms of the p- value, the model is fit to the data 
even better than the ACDM (with the p- value of 0.8709). In addition, the improvement 
in the evidence compared to the {Bi, 0,'^), even though it has more free parameters, is an 
indication that the (i?i, -63, rijjj-model is observationally favored to the (i?i , r2^)-model. 
In addition, our statistical analysis shows that, as in the previous case of the (Si, i?2, ^^m)- 
model, the parameters of the model are again correlated in a very similar way. Therefore, 
for brevity reasons, here we only show the two-dimensional distribution of the sample points 
in the B1-B3 plane for prior ranges of [—1000, 1000] (left panel of fig. where a strong 
correlation between the two parameters can be seen. 

For the (S2, -B3, i7^)-model we get the best-fit of 548.04 (p-value of 0.8543) and 
the log-evidence of —280.91 (with prior ranges of [—5,5]); these show a very good fit to 
the data. This is a particularly interesting case as our analysis showed (see the previous 
section) that the {B2, i^^) i^Si ^m) models did not fit the data individually and were 
therefore excluded. The parameters of the model in this case are also correlated. Fig. ^ 
(right panel) illustrates this correlation in the B2-B3 plane. 

Let us now try to analytically understand the correlation patterns observed in figs. 
and ^ for the models (i?i, B2, ^m)^ (-^i) -^3) ^m) ^.nd {B2, -B3, fi^). The complexity of our 
bimetric model increases considerably when we increase the number of model parameters 
and a thorough analytical consideration of each of the cases (as we did in the previous 
section for the two-parameter models) is beyond the scope of this work. Here, we there- 
fore only consider one of them (the {Bi, B2, r2^)-model) in order to get an understanding 
of what causes the parameters to be strongly correlated with those specific correlation 
patterns. The other cases can be studied in similar ways. 

As for the discussions of the previous section, the key expression to consider here is 
eq. ( |3.9D for the dynamical variable y. This equation for the (Si, S2, r2^)-model reduces 
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Figure 8: Left: Two-dimensional unweighted distribution of the sample points in the B1-B3 plane 
for the bigravity model with only _Bi, B3 and fi^^ varying. Right: Similar distribution in the B2-B3 
plane for the model with only B2, -B3 and varying. 



to the following one: 



B2y^ + Biy^ 



0. 



(4.14) 



This is a cubic equation for which we again consider the discriminant. This turns out 
to be: 



A 



-WlB2^m + 4.BfBl + -Bf + Bfnl, - 4^2^^ + 12S|0^ 



12S|Om + 4S|. (4.15) 



We see that if B2 is negative, the discriminant will be positive, and eq. (4.14) will 
always have three roots for y. In the opposite case (i.e. positive B2), we will always be 
able to move back to a time when the discriminant was negative and there was only one 
root. In this case, consideration of the real root reveals that it will at some point in the 
past (high redshifts) become negative if Bi is negative, whereas this will not happen if the 
theory has positive Bi in addition to the positive B2- In addition, looking at the equation 
for the Hubble parameter (p.6|), we see that to get acceleration (required for a good fit to 
the data), both constants cannot be negative simultaneously (since y has to be positive). 
Therefore, the possible theories are the ones with either B2 negative and Bi positive, or 
both Bi and B2 positive. Hence the first conclusion is that Bi must always be positive. 

As in the previous section, we now assume that the model is effectively equivalent 
to the ACDM model with an effective 0^*^^^^ of yBi + y^i?2- This should give a good 
estimate for the best-fit parameters given that y does not vary significantly with time. This 
gives an estimate for y in terms of the best-fit Bi and B2, and the effective Q^'-"^^^ : 



B,±^Bf + 4Qi^'D^'B2 
25^ 



In addition, eq. ( [4.14| ) in this case becomes 



Bi 



y + {n^-B2)y-^ = 0. 



(4.16) 



(4.17) 
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Let us now insert y from eq. ( 4.16 ) into eq. ( [4. IT]) and solve for Bi as a function 



of B2. Assuming that y is calculated at present (z = 0), we can also use the relation 
Q^ACDM _j_ _ -|^8_ therefore obtain the following relation between B\ and B2^: 



3 /j^ACDM(i_5^) 

Bi = == . (4.18) 

V3 - 2B2 ^ ' 

Here, the reality and positivity conditions for Bi imply that B2 < 1- This function 
very well explains the particular correlation pattern observed in fig. |^. The overall shape 
of the function shows a very similar behavior and the only difference is in its normalization. 
The two curves remarkably match if we tune the value of the effective cosmological constant 
Q^ACDM something about 0.57, which is slightly different from the best-fit value in the 
ACDM case (i.e. ~ 0.7). This is however not a big surprise, given that the obtained 
analytical expression for Bi versus B2 is the result of approximating a time-dependent 
quantity, i.e. y, with a constant. 

So far in this section, we have studied three-parameter sub-models where in each case 
two of the three parameters -Bi, B2 and B^ are allowed to vary (in addition to ^^). Three 
more sub- models with the same dimensionality exist: cases where B4 is considered as a 
free parameter together with fi^ and one of the Bi , B2 and B^ . We argued in the previous 
section that the two-dimensional model (-B4, Jl^) is equivalent to the CDM model and 
cannot give acceleration. The situation may however change if we switch on one of the Bi, 
B2 or S3 terms together with the -B4 term. For example, we have already seen that the 
(i?i, f^[^)-model, which is a sub-model of (i?i, i?4, fi^) can perfectly fit the data, meaning 
that we expect a good fit from the (i?i, i?4, rj^)-model too. Our scan for this case with prior 
ranges of [—5,5] for Bi and B4 gives a best-fit of 548.86, corresponding to the j»-value 
of 0.8485. Compared to the (i?i,il^) case, this indicates a slightly better fit. In terms 
of the Bayesian evidence, the log-evidence we get in this case is —281.42 which is again 
better than the one for the {B\,^^) case. The improvement in the evidence {AlogZ) is 
however not sufficiently large (~ 0.3) for the S4 term to be considered essential in providing 
a better fit to the data. For the {B2, B4,Q'^) and {B3, B4,Q,^) models, our results show 
that they are observationally excluded, although there is improvement in both p- value and 
log-evidence in these cases compared to (52,^^^) and (^3,^^). The {B2, B4,nl^)-model 
gives the best-fit of 806.82 (with a p-value smaller than 0.0001) and the log-evidence of 
—420.87. The best-fit for the (i?3, B^, 05li)"™odel is 685.30 (corresponding to the p-value 
of 0.0023) and the log-evidence is —351.14. Finally, in terms of the marginalized posteriors 
and mean likelihoods, all these models again show some degree of correlation between the 



*This assumption simplifies the final relation significantly, while does not afTect the general features 
of the relation. In order to obtain an even better approximation, one could relax this assumption and 
retain both Q'j^^^^' and 0,%^ explicitly. In this case, the expression will clearly be a function of redshift. 
By changing the redshift one can then tune the function such that it perfectly matches the numerically 
obtained correlation curve. This redshift represents a slightly earlier time than today around which the 
measured data points cluster most. 

^In the calculations that lead to this expression, we also make use of our previously acquired knowledge 
of the positivity of Bi . This is important in choosing the correct branch of the equation. 



-30- 




Figure 9: As in figs. |[ ^ and ^, but for tlie bigravity model with only Bi, B4 and varying 
(other parameters are set to zero). Here, (flat) prior ranges for both Bi and B4 are [—5, 5], and for 
f^o, is [0,1]. 



parameters, although the correlations are much weaker than for the models we studied 
earlier in this section. The {Bi, B4,^l^)-model is clearly more interesting since it yields 
a good fit to the data, and in this case, the one-dimensional plots (fig. ^ illustrate that 
Bi and are constrained to some positive values, while for B4 the observations prefer 
negative values. By studying the case with larger prior ranges and the two-dimensional 
plots a weak correlation between the parameters can be seen; here we do not present the 
plots for such cases for brevity reasons. 

4.2.3 Further generalizations 

So far, our studies have been limited to the sub-models of the full bimetric model where 
only two or three model parameters (out of six) vary and the rest are fixed to zero. This 
was however an essential step in understanding the structure of the model, what the roles 
of each parameter are in providing good fits to the observations and how they interact. 
We can in principle continue our studies to higher-dimensional models by using the same 
procedure as before and try to constrain the model parameters in each case. We however 
saw that even for the three-dimensional models, our data seem to be insufficient in placing 
strong constraints upon the parameters. We demonstrated this by plotting one-dimensional 
marginalized posteriors and mean likelihoods and comparing the two when the prior ranges 
were changed. We observed that some parameters were only constrained to be positive or 
negative and there were strong correlations between many of them. This all means that 
adding more and more parameters to such unconstrained models only increases the degrees 
of correlations and the larger models will most probably become even less constrained. For 
these reasons, we do not investigate such models in detail and restrict our studies to only 
reporting the p-value and log-evidence in each case. These quantities, together with the 
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free 


free 





free 


free 


546.52 


0.8581 


-279.56 


(Bl, B3, B4, njjj) 





free 





free 


free 


free 


546.78 


0.8563 


-280.00 


(B2, B3, B4, fjJJi) 








free 


free 


free 


free 


549.68 


0.8353 


-282.89 


(Bl, B2, B3, B4, fjjii) 





free 


free 


free 


free 


free 


546.50 


0.8515 


-279.60 


full bigravity model 


free 


free 


free 


free 


free 


free 


546.50 


0.8445 


-279.82 



Table 2: Best-fit x^, p- value and log-evidence for different bigravity sub- models when the models 
are constrained by SNc, CMB, BAO and Hq measurements at the background level. For each 
model, parameters that are allowed to vary are marked as "free" ; the non- varying parameters are 
fixed to zero. The models are named by reduced parameters Bs, defined as Bi ~ m?l3i/HQ. Prior 
ranges used in the statistical scans are chosen to be [—5, 5] for Bs and [0, 1] for fi^. The sub-model 
in the uppermost row with only i?o and $7^ being free is equivalent to the ACDM concordance 
model of cosmology. The "full bigravity model" , i.e. the one in the lowermost row, correspond 
to the full parameter space where all model parameters (including the cosmological term Bq) are 
allowed to vary. In all models, except the ACDM, the present value of the Hubble parameter, Hq, 
is a prediction of the model and is determined in terms of the other parameters. For the ACDM 
case, Hq is a free parameter which has been constrained by cosmological data. 

ones for the previous models are presented in table ^. These include the four-parameter 
models (Bi,S2,B3,J7^), (Si, 52,^4,0^), (^1,53,^4,^^^) and S3, S4, fi^), as well 
as the five-parameter model i?2) -B3, i?4, rij^). Our results show that all these models 
are in very good agreement with observations, as far as the p- values are concerned, and in 
some cases are more favored compared to the lower-dimensional models from the Bayesian 
point of view (see the log-evidence values in table 

4.3 Including a cosmological constant: the full theory 

Our main objective in this paper has been to investigate whether the ghost-free, massive, 
bimetric gravity, represented by the action (^]^), can give rise to a late-time accelerated 
universe without any need to an explicit cosmological constant term. For this reason, we 
fixed the free parameter Bq to zero in all the analyses we have performed so far. In order 
to be as comprehensive as possible, here we briefly study the full model, i.e. when the Bq 
term is also allowed to vary. We expect a very good fit in this case, since the previously 
considered models that are well fit to the data, including the ACDM model, are basically 
all sub-models of this six-dimensional model. 
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Table |2| includes the results of our analysis for this case (lowermost row). The best- 
fit value we have obtained for the case where a prior range of [—5, 5] is used for all 
Bs (while the prior range for is chosen to be [0, 1]) is 546.50, which corresponds to 
a p- value of 0.8445. Compared to the ACDM model, we have obtained a better even 
though the p- value is smaller. This is natural however regarding the fact that the full 
bigravity model has more free parameters than the ACDM. Our results show that there is 
at least one sub-model (the {Bi, B3,Q^) case) with a better best-fit (= 542.82). Since 
this model is obviously a sub-model of the full model, we expect this value to be found 
even in the latter case. We believe that the reason for observing the contrary is simply 
that the particular best-fit point in the {Bi, i?3, r2^)-model is highly fine-tuned so that our 
scanning algorithm has not been able to probe it when the parameter space has become 
much larger. We discussed this issue in section 3^ where we introduced our scanning 
technique. We mentioned that the technique is optimized for Bayesian inference where 
the value of the global maximum likelihood is not important as long as a huge number of 
such high-likelihood points do not exist and the posterior mass is not affected by them. In 
addition, the log-evidence obtained for the full model is —279.82, which compared to the 
value for the ACDM model (i.e. —278.50) indicates that the model is less favored. This 
is not a big surprise though because we know that the ACDM, as a sub-model of our full 
bigravity model, is in perfect agreement with background observations. In the absence of 
any theoretical preference of a model over the other, Bayesian inference naturally selects 
the lower-dimensional one. 



5. Conclusions and outlook 



In this paper, we have performed a thorough and extensive parameter estimation and 
model comparison for the ghost-free, bimetric theory of massive gravity by comparing the 
predictions of the model to various cosmological measurements at the background level. 
To our knowledge, this has thus far been the most exhaustive statistical analysis of the 
model. 

We assumed the two metrics of the model to be spatially flat, homogeneous and 
isotropic, and only one metric was considered as the physical one coupled to matter, while 
both metrics were dynamical. We broadly discussed different observational data used in 
the analysis (SNe, CMB and BAO), dynamical equations and initial conditions used in nu- 
merically computing various cosmological distances, and statistical frameworks employed 
in constraining the model. We used nested sampling to explore the model's parameter 
space, find the maximum likelihood points, calculate the Bayesian evidence, compare dif- 
ferent sub-models and constrain the model parameters. In many places, we complemented 
the results of our statistical studies with detailed analytical explanations; this helped us 
understand various interesting features observed in the results. 

The main objective of our analysis has been to investigate the possibility of obtaining 
a late-time acceleration of the Universe, within the bigravity framework, that is consistent 
with observations. For this reason, we mainly focused on the interesting case where no 
explicit cosmological constant was assumed. In the absence of any theoretical restrictions 
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on the possible values of the free parameters of the model, we chose somewhat arbitrary, 
but justified, ranges for our scans. In order to understand the roles that the different 
parameters play in the dynamics of the model, we have extensively studied various sub- 
models of the full model where only some of the free parameters have been allowed to 
vary. 

Our results show that the bimetric model can in general yield very good fits to the 
observed data at a very high confidence level. In other words, the model can produce the 
cosmic acceleration without the need to resort to a cosmological constant term. The reason 
is that the parameter space of the model contains many points for which the model behaves 
very much like the ACDM, i.e. effectively gives a cosmological constant, at least at the 
background level. This similarity to the ACDM makes the model be in agreement with the 
observations. Although the ACDM model remains to be slightly favored by observations, 
both models are statistically perfectly consistent with the data. Even though the value of 
the Bayesian evidence for the ACDM model compared to the values for the bigravity model 
(and its sub-models) suggest that the former is preferred from the Bayesian point of view, 
we argued that this should be viewed with caution. Our argument was mainly based on 
the impacts our prior choices of parameter ranges and our prior preferences for different 
models over the others could have upon such conclusions. 

Wc additionally observe that some particular sub-models with only some parameters 
varying are ruled out by the data. The first model in this class is the one with only the 
quartic (/34) mass term in the action being allowed to vary and the other parameters are 
fixed to zero. Other similarly excluded sub-models are the ones where only the quadratic 
(/32) or cubic (/Sa) mass terms in the action are present; turning on both of the terms at 
the same time however brings the theory back into the game. Models with only quadratic 
and quartic, or cubic and quartic terms are also excluded. Finally, it turns out that the 
most important term for obtaining a good fit is the linear ) term which can individually 
produce a cosmic evolution in perfect agreement with the cosmological observations. 

As far as the observational constraints on the values of the parameters are concerned, 
one-dimensional marginalized posterior probabilities and mean likelihoods reveal that, ex- 
cept the sub-model with only the linear term existing, in all other viable sub-models the 
free parameters are correlated. The correlations are in some cases so strong that the one- 
dimensional probability plots are not reliable. Although in these cases we cannot determine 
the values of the parameters, the correlation patterns indicate that some parameters are 
preferred to be positive or negative. In the linear-term-only sub-model, where we have 
reliable constraints, the graviton mass can be determined and turns out to be of the order 
of the present value of the Hubble parameter (which is not a big surprise) . 

In cases where our prior ranges for the parameters are wide enough, our results show 
that they can take on arbitrarily large or small values (respecting the constraints on their 
signs) while giving very good fits. In this case, the model tends to become as similar to the 
ACDM as possible where the time evolution of the effective dynamical variable of the theory 
(y) becomes negligible. Since for a good-fit model, y starts off with an asymptotically zero 
value in the far past, its value must remain vanishingly small in the far future if its variation 
with time is to be tiny. This explains why the absolute values of the parameters are required 
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to be infinitely large in those cases: the parameters are always multiplied by different powers 
of y in the evolution equations and their large values compensate for the smallness of y in 
order to give a cosmological constant value compatible with observations. Working only 
with the background dynamics of the model and using only the geometrical measurements 
of the cosmic evolution (as we have done in the present work) cannot determine how small 
the time variation of y must be. Currently, the model is consistent with the data even in 
cases where y changes considerably with time. Additional cosmological or astrophysical 
constraints on the model are required to break the degeneracy between the parameters and 
determine how different the model wants to be from the ACDM. 

Perhaps the most natural extension of the present work would be to repeat the analysis 
performed here for the case where perturbative equations are used. One can in this case use 
a wealth of data available from the measurements of the CMB anisotropics and the growth 
of large-scale structure of the Universe to further constrain the model. This could tell us 
whether the bimetric model with nontrivial dynamics could match the data as well as or 
better than the ACDM without preferring the most ACDM-like corners of the parameter 
space. In particular, any deviation from a constant equation of state parameter of dark 
energy observed by existing or forthcoming cosmological experiments, could potentially 
favor the model over the standard ACDM and place interesting constraints on its parame- 
ters. Other astrophysical tests of the model, in particular on scales much smaller than the 
cosmological ones, such as the solar system or black hole observations, can provide highly 
important additional pieces of information to the analysis to either rule out the theory or 
to constrain its properties. 

And finally, one can generalize the model to the cases where the flatness assumption 
for the curvature of the Universe is relaxed, other more nontrivial types of metrics are 
considered or the physical metric is not identical to only one metric (our g) but is defined 
in terms of a combination of the two metrics {g and /). In addition, one very interesting 
generalization of the model to study would be the case where the second metric (/) is also 
coupled to matter. We leave the investigation of such possibilities for future work. 
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